ESSNET-SDC Deliverable Blocking Methods for Microdata Protection

نویسنده

  • Josep Domingo-Ferrer
چکیده

Blocking is a well-known technique used to partition a set of records into several subsets of manageable size. The standard approach to blocking is to split the records according to the values of one or several attributes (called blocking attributes). This report presents a new blocking method based on 2-trees for intelligently partitioning very large data sets. Blocking makes sense whenever the treatment to be used on the data set is of complexity higher than linear. We take here microaggregation (which has quadratic complexity) as a treatment, but other superlinear treatments can benefit from the blocking method described (like record linkage, also of quadratic complexity).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ESSNET-SDC Deliverable Report on Synthetic Data Files

Publication of synthetic —i.e. simulated— data is an alternative to masking for statistical disclosure control of microdata. The idea is to randomly generate data with the constraint that certain statistics or internal relationships of the original dataset should be preserved. Several approaches for generating synthetic data files are described in this report. The pros and cons of synthetic dat...

متن کامل

Microdata Protection Method Through Microaggregation: A Systematic Approach

Microdata protection in statistical databases has recently become a major societal concern and has been intensively studied in recent years. Statistical Disclosure Control (SDC) is often applied to statistical databases before they are released for public use. Microaggregation for SDC is a family of methods to protect microdata from individual identification. SDC seeks to protect microdata in s...

متن کامل

ESSnet on common tools and harmonised methodology for SDC in the ESS

This overview is meant to briefly outline well known methods for disclosure controls for frequency tables, that is tables of counts (or percentages) where each cell value represents the number of respondents in that cell. It is mainly a summary / shortened version of chapter 5 of the ESSNET-SDC handbook (Hundepool et al, 2010) contributed by Jane Naylor (ONS). To some extent this paper presents...

متن کامل

Microdata Protection Method Through Microaggregation: A Median-Based Approach

Microaggregation for Statistical Disclosure Control (SDC) is a family of methods to protect microdata from individual identification. SDC seeks to protect microdata in such a way that can be published and mined without providing any private information that can be linked to specific individuals. The aim of SDC is to modify the original microdata in such a way that the modified data and the orig...

متن کامل

A Survey of Inference Control Methods for Privacy-Preserving Data Mining

Inference control in databases, also known as Statistical Disclosure Control (SDC), is about protecting data so they can be published without revealing confidential information that can be linked to specific individuals among those to which the data correspond. This is an important application in several areas, such as official statistics, health statistics, e-commerce (sharing of consumer data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008